CASOM: SOM for Contingency Tables and Biplot

نویسنده

  • Rodolphe Priam
چکیده

This article presents a new way of dealing with the self-organizing map methods to visualize by an original way qualitative data or histogram vectors as we can find on the Internet e.g. after the pre-processing of plain text documents. The main difference with other known methods is the nature of the processed matrix: a contingency table. By adding constraints during the learning of a mixture of a discrete distribution which models the noise in classes of documents or rows, we obtain a self-organizing map algorithm named CASOM. We explain the properties of the model: metrics, criteria, links with Correspondence Analysis and mean biplot which help to better interpret results. A more general projection available for self-organizing maps in the dual Euclidian space or columns is also introduced. Then, we present some experiments on a corpus of textual short summaries to illustrate the behavior of the algorithm and to show its interest. The conclusion discusses alternative models and gives perspectives of the contribution.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ratio Maps and Correspondence Analysis

We compare two methods for visualising contingency tables and develop a method called the ratio map which combines the good properties of both. The first is a biplot based on the logratio approach to compositional data analysis. This approach is founded on the principle of subcompositional coherence, which assures that results are invariant to considering subsets of the composition. The second ...

متن کامل

Non-symmetric Correspondence Analysis and Biplot

In this paper Non-Symmetric Correspondence Analysis (NSCA, Lauro & D'Ambra, 1984; D'Ambra & Lauro, 1989, 1992) is proposed as an useful technique for evaluating contingency table with a dependence structure, in particular within the context of comparing market share di erences. Technical aspects of the method are discussed with a view towards application, giving special attention to the biplot ...

متن کامل

SOMbrero: An R Package for Numeric and Non-numeric Self-Organizing Maps

This paper presents SOMbrero, a new R package for selforganizing maps. Along with the standard SOM algorithm for numeric data, it implements self-organizing maps for contingency tables (“Korresp”) and for dissimilarity data (“relational SOM”), all relying on stochastic (i.e., on-line) training. It offers many graphical outputs and diagnostic tools, and comes with a user-friendly web graphical i...

متن کامل

Analysis of Dynamic Longitudinal Categorical Data in Incomplete Contingency Tables Using Capture-Recapture Sampling: A case Study of Semi-Concentrated Doctoral Exam

Abstract. In this paper, dynamic longitudinal categorical data and estimation of their parameters in incomplete contingency tables are evaluated. To apply the proposed method, a study has been conducted on the data of the semi-concentrated doctoral exam of the National Organization for Educational Testing (NOET). The results of studies such as the obtained confidence intervals and calculating t...

متن کامل

Partial Association Components in Multi-way Contingency Tables and Their Statistiical Analysis

In analyses of contingency tables made up of categorical variables, the study of relationship between the variables is usually the major objective. So far, many association measures and association models have been used to measure  the association structure present in the table. Although the association measures merely determine the degree of strength of association between the study varia...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005